Semiautomatic Acquisition of Semantic Structures for Understanding Domain-Specific Natural Language Queries

نویسندگان

  • Helen M. Meng
  • Kai-Chung Siu
چکیده

ÐThis paper describes a methodology for semiautomatic grammar induction from unannotated corpora of information-seeking queries in a restricted domain. The grammar contains both semantic and syntactic structures, which are conducive to (spoken) natural language understanding. Our work aims to ameliorate the reliance of grammar development on expert handcrafting or on the availability of annotated corpora. To strive for reasonable coverage on real data, as well as portability across domains and languages, we adopt a statistical approach. Agglomerative clustering using the symmetrized divergence criterion groups words ªspatially.º These words have similar left and right contexts and tend to form semantic classes. Agglomerative clustering using mutual information groups words ªtemporally.º These words tend to co-occur sequentially to form phrases or multiword entities. Our approach is amenable to the optional injection of prior knowledge to catalyze grammar induction. The resultant grammar is interpretable by humans and is amenable to hand-editing for refinement. Hence, our approach is semiautomatic in nature. Experiments were conducted using the ATIS (Air Travel Information Service) corpus and the semiautomatically-induced grammar G SA is compared to an entirely handcrafted grammar G H. G H took two months to develop and gave concept error rates of 7 percent and 11.3 percent, respectively, in language understanding of two test corpora. G SA took only three days to produce and gave concept errors of 14 percent and 12.2 percent on the corresponding test corpora. These results provide a desirable trade-off between language understanding performance and grammar development effort.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Developing a BIM-based Spatial Ontology for Semantic Querying of 3D Property Information

With the growing dominance of complex and multi-level urban structures, current cadastral systems, which are often developed based on 2D representations, are not capable of providing unambiguous spatial information about urban properties. Therefore, the concept of 3D cadastre is proposed to support 3D digital representation of land and properties and facilitate the communication of legal owners...

متن کامل

Semiautomatic Image Retrieval Using the High Level Semantic Labels

Content-based image retrieval and text-based image retrieval are two fundamental approaches in the field of image retrieval. The challenges related to each of these approaches, guide the researchers to use combining approaches and semi-automatic retrieval using the user interaction in the retrieval cycle. Hence, in this paper, an image retrieval system is introduced that provided two kind of qu...

متن کامل

Semi-automatic acquisition of domain-specific semantic structures

This paper describes a methodology for semi-automatic grammar induction from unannotated corpora belonging to a restricted domain. The grammar contains both semantic and syntactic structures, which are conducive towards language understanding. Our work aims to ameliorate the reliance of grammar development on expert handcrafting or the availability of annotated corpora. To strive for a reasonab...

متن کامل

Ontoprima: a Prototype for Automating Ontology Population

Ontology Population supports the process of building ontologies in the complex task of instantiating ontology. Performing this process manually is both expensive and time consuming; this logically leads to attempts of fully or partially automating the process of acquisition and absorption of knowledge in general and the process of Ontology Population in particular. This paper presents OntoPRiMa...

متن کامل

Extracting a Domain-Specific Ontology from a Corporate Intranet

This paper describes our actual and ongoing work in supporting semi-automatic ontology acquisition from a corporate intranet of an insurance company. A comprehensive architecture and a system for semiautomatic ontology acquisition supports processing semi-structured information (e.g. contained in dictionaries) and natural language documents and including existing core ontologies (e.g. GermaNet,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Trans. Knowl. Data Eng.

دوره 14  شماره 

صفحات  -

تاریخ انتشار 2002